Explicit Kernel Rewards Regression for data-efficient near-optimal policy identification

نویسندگان

Daniel Schneegaß

Steffen Udluft

Thomas Martinetz

چکیده

We present the Explicit Kernel Rewards Regression (EKRR) approach, as an extension of Kernel Rewards Regression (KRR), for Optimal Policy Identification in Reinforcement Learning. The method uses the Structural Risk Minimisation paradigm to achieve a high generalisation capability. This explicit version of KRR offers at least two important advantages. On the one hand, finding a near-optimal policy is done by a quadratic program, hence no Policy Iteration techniques are necessary. And on the other hand, the approach allows for the usage of further constraints and certain regularisation techniques as e.g. in Ridge Regression and Support Vector Machines.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Optimality of Neural Rewards Regression for Data-Efficient Batch Near-Optimal Policy Identification

In this paper we present two substantial extensions of Neural Rewards Regression (NRR) [1]. In order to give a less biased estimator of the Bellman Residual and to facilitate the regression character of NRR, we incorporate an improved, Auxiliared Bellman Residual [2] and provide, to the best of our knowledge, the first Neural Network based implementation of the novel Bellman Residual minimisati...

متن کامل

Kernel Rewards Regression: An Information Efficient Batch Policy Iteration Approach

We present the novel Kernel Rewards Regression (KRR) method for Policy Iteration in Reinforcement Learning on continuous state domains. Our method is able to obtain very useful policies observing just a few state action transitions. It considers the Reinforcement Learning problem as a regression task for which any appropriate technique may be applied. The use of kernel methods, e.g. the Support...

متن کامل

Development of a Pharmacogenomics Model based on Support Vector Regression with Optimal Features Selection Approach to Determine the Initial Therapeutic Dose of Warfarin Anticoagulant Drug

Introduction: Using artificial intelligence tools in pharmacogenomics is one of the latest bioinformatics research fields. One of the most important drugs that determining its initial therapeutic dose is difficult is the anticoagulant warfarin. Warfarin is an oral anticoagulant that, due to its narrow therapeutic window and complex interrelationships of individual factors, the selection of its ...

متن کامل

Neural Rewards Regression for near-optimal policy identification in Markovian and partial observable environments

Neural Rewards Regression (NRR) is a generalisation of Temporal Difference Learning (TD-Learning) and Approximate Q-Iteration with Neural Networks. The method allows to trade between these two techniques as well as between approaching the fixed point of the Bellman iteration and minimising the Bellman residual. NRR explicitly finds a near-optimal Q-function without an algorithmic framework exce...

متن کامل

Development of a Pharmacogenomics Model based on Support Vector Regression with Optimal Features Selection Approach to Determine the Initial Therapeutic Dose of Warfarin Anticoagulant Drug

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Explicit Kernel Rewards Regression for data-efficient near-optimal policy identification

نویسندگان

چکیده

منابع مشابه

Improving Optimality of Neural Rewards Regression for Data-Efficient Batch Near-Optimal Policy Identification

Kernel Rewards Regression: An Information Efficient Batch Policy Iteration Approach

Development of a Pharmacogenomics Model based on Support Vector Regression with Optimal Features Selection Approach to Determine the Initial Therapeutic Dose of Warfarin Anticoagulant Drug

Neural Rewards Regression for near-optimal policy identification in Markovian and partial observable environments

Development of a Pharmacogenomics Model based on Support Vector Regression with Optimal Features Selection Approach to Determine the Initial Therapeutic Dose of Warfarin Anticoagulant Drug

عنوان ژورنال:

اشتراک گذاری